Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pathname2url() and url2pathname() docs #127125

Merged
merged 4 commits into from
Nov 24, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Nov 22, 2024

These functions have long sown confusion among Python developers. The existing documentation says they deal with URL path components, but that doesn't fit the evidence on Windows:

>>> pathname2url(r'C:\foo')
'///C:/foo'
>>> pathname2url(r'\\server\share')
'////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like file://///C:/foo and file://////server/share. Clearly this isn't right. Yet the implementation in nturl2path is deliberate, and the url2pathname() function correctly inverts it.

On non-Windows platforms, the behaviour until quite recently is to simply quote/unquote the path without adding or removing any leading slashes. This behaviour is compatible with both interpretations -- 1) the value is a URL path component (existing docs), and 2) the value is everything following file: (this PR)

The conclusion I draw is that these functions operate on everything after the file: prefix, which may include an authority section. This is the only explanation that fits both the Windows and non-Windows behaviour. It's also a better match for the function names.


📚 Documentation preview 📚: https://cpython-previews--127125.org.readthedocs.build/

These functions have long sown confusion among Python developers. Even in
the urllib implementation and tests, they seem to be used in contradictory
ways. A test helper named `sanepathname2url()` has been with us since 2004!

The existing documentation says that these functions deal with URL path
components. But that doesn't fit the evidence on Windows:

    >>> pathname2url(r'C:\foo')
    '///C:/foo'
    >>> pathname2url(r'\\server\share')
    '////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like
`file://///C:/foo` and `file://////server/share`. Clearly this isn't right.

The conclusion I draw is that these functions operate on everything after
the `file:` prefix, which may include an authority section.
@barneygale
Copy link
Contributor Author

barneygale commented Nov 23, 2024

I think the confusion came about because the original urllib author understood that file:/etc/hosts was the correct way to express /etc/hosts as a file URI. They weren't wrong per se, and it's still an acceptable form. But later on, it became much more prevalent to express that path as file:///etc/hosts, and some folks began to use pathname2url() and url2pathname() in a slightly different way. But the new definition never gelled with the Windows implementation, and traces of the old understanding can be found in the codebase and tests.

edit: actually, it might be due to a 90s-era misunderstanding between two devs

Doc/library/urllib.request.rst Outdated Show resolved Hide resolved
Doc/library/urllib.request.rst Outdated Show resolved Hide resolved
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@barneygale barneygale merged commit 307c633 into python:main Nov 24, 2024
29 checks passed
@miss-islington-app
Copy link

Thanks @barneygale for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 24, 2024
These functions have long sown confusion among Python developers. The
existing documentation says they deal with URL path components, but that
doesn't fit the evidence on Windows:

    >>> pathname2url(r'C:\foo')
    '///C:/foo'
    >>> pathname2url(r'\\server\share')
    '////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like
`file://///C:/foo` and `file://////server/share`. Clearly this isn't right.
Yet the implementation in `nturl2path` is deliberate, and the
`url2pathname()` function correctly inverts it.

On non-Windows platforms, the behaviour until quite recently is to simply
quote/unquote the path without adding or removing any leading slashes. This
behaviour is compatible with *both* interpretations -- 1) the value is a
URL path component (existing docs), and 2) the value is everything
following `file:` (this commit)

The conclusion I draw is that these functions operate on everything after
the `file:` prefix, which may include an authority section. This is the
only explanation that fits both the  Windows and non-Windows behaviour.
It's also a better match for the function names.
(cherry picked from commit 307c633)

Co-authored-by: Barney Gale <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 24, 2024
These functions have long sown confusion among Python developers. The
existing documentation says they deal with URL path components, but that
doesn't fit the evidence on Windows:

    >>> pathname2url(r'C:\foo')
    '///C:/foo'
    >>> pathname2url(r'\\server\share')
    '////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like
`file://///C:/foo` and `file://////server/share`. Clearly this isn't right.
Yet the implementation in `nturl2path` is deliberate, and the
`url2pathname()` function correctly inverts it.

On non-Windows platforms, the behaviour until quite recently is to simply
quote/unquote the path without adding or removing any leading slashes. This
behaviour is compatible with *both* interpretations -- 1) the value is a
URL path component (existing docs), and 2) the value is everything
following `file:` (this commit)

The conclusion I draw is that these functions operate on everything after
the `file:` prefix, which may include an authority section. This is the
only explanation that fits both the  Windows and non-Windows behaviour.
It's also a better match for the function names.
(cherry picked from commit 307c633)

Co-authored-by: Barney Gale <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Nov 24, 2024

GH-127232 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 24, 2024
@bedevere-app
Copy link

bedevere-app bot commented Nov 24, 2024

GH-127233 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Nov 24, 2024
barneygale added a commit that referenced this pull request Nov 24, 2024
…#127232)

Improve `pathname2url()` and `url2pathname()` docs (GH-127125)

These functions have long sown confusion among Python developers. The
existing documentation says they deal with URL path components, but that
doesn't fit the evidence on Windows:

    >>> pathname2url(r'C:\foo')
    '///C:/foo'
    >>> pathname2url(r'\\server\share')
    '////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like
`file://///C:/foo` and `file://////server/share`. Clearly this isn't right.
Yet the implementation in `nturl2path` is deliberate, and the
`url2pathname()` function correctly inverts it.

On non-Windows platforms, the behaviour until quite recently is to simply
quote/unquote the path without adding or removing any leading slashes. This
behaviour is compatible with *both* interpretations -- 1) the value is a
URL path component (existing docs), and 2) the value is everything
following `file:` (this commit)

The conclusion I draw is that these functions operate on everything after
the `file:` prefix, which may include an authority section. This is the
only explanation that fits both the  Windows and non-Windows behaviour.
It's also a better match for the function names.
(cherry picked from commit 307c633)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this pull request Nov 24, 2024
…#127233)

Improve `pathname2url()` and `url2pathname()` docs (GH-127125)

These functions have long sown confusion among Python developers. The
existing documentation says they deal with URL path components, but that
doesn't fit the evidence on Windows:

    >>> pathname2url(r'C:\foo')
    '///C:/foo'
    >>> pathname2url(r'\\server\share')
    '////server/share'  # or '//server/share' as of quite recently

If these were URL path components, they would imply complete URLs like
`file://///C:/foo` and `file://////server/share`. Clearly this isn't right.
Yet the implementation in `nturl2path` is deliberate, and the
`url2pathname()` function correctly inverts it.

On non-Windows platforms, the behaviour until quite recently is to simply
quote/unquote the path without adding or removing any leading slashes. This
behaviour is compatible with *both* interpretations -- 1) the value is a
URL path component (existing docs), and 2) the value is everything
following `file:` (this commit)

The conclusion I draw is that these functions operate on everything after
the `file:` prefix, which may include an authority section. This is the
only explanation that fits both the  Windows and non-Windows behaviour.
It's also a better match for the function names.
(cherry picked from commit 307c633)

Co-authored-by: Barney Gale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip issue skip news
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants